 |
Data preparation is one of the most important and often time-consuming aspects of data mining. In fact, it is estimated that data preparation usually takes 50-70% of a project’s time and effort. It is highly dependent on the "understand data" and "understand business" activities, so devoting adequate energy to these earlier activities can minimize this overhead, but you still need to expend a good amount of effort preparing and packaging the data for mining. Depending on the organization and its goals, data preparation typically involves the following: • Merging data sets and/or records • Selecting a sample subset of data • Aggregating records • Deriving new attributes • Sorting the data for modeling • Removing or replacing blank or missing values • Splitting into training and test data sets |
|
Relationships
Licensed Materials - Property of IBM. (c) Copyright IBM Corp. 2015.
IBM, the IBM logo, and SPSS are trademarks of International Business Machines Corp,
registered in many jurisdictions worldwide. Other products and service names may be trademarks of IBM or
other companies. You may use the Content 'AS IS" or modify them, however IBM will not be responsible for
any deficiencies or errors that result from modifications that you make.
|
|
|